NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Embodied visuomotor representation

https://doi.org/10.1038/s44182-025-00047-y

Burner, Levi; Fermüller, Cornelia; Aloimonos, Yiannis (December 2025, npj Robotics)
DeCroon, Guido (Ed.)
Abstract Imagine sitting at your desk, looking at objects on it. You do not know their exact distances from your eye in meters, but you can immediately reach out and touch them. Instead of an externally defined unit, your sense of distance is tied to your action’s embodiment. In contrast, conventional robotics relies on precise calibration to external units, with which vision and control processes communicate. We introduceEmbodied Visuomotor Representation, a methodology for inferring distance in a unit implied by action. With it a robot without knowledge of its size, environmental scale, or strength can quickly learn to touch and clear obstacles within seconds of operation. Likewise, in simulation, an agent without knowledge of its mass or strength can successfully jump across a gap of unknown size after a few test oscillations. These behaviors mirror natural strategies observed in bees and gerbils, which also lack calibration in an external unit.
more » « less
Free, publicly-accessible full text available December 1, 2026
Learning Normal Flow Directly From Events

Yuan, Dehao; Burner, Levi; Wu, Jiayi; Liu, Minghui Liu; Chen, Jingxi; Aloimonos, John; Fermuller, Cornelia (October 2025, CVF)

Event-based motion field estimation is an important task. However, current optical flow methods face challenges: learning-based approaches, often frame-based and relying on CNNs, lack cross-domain transferability, while model-based methods, though more robust, are less accurate. To address the limitations of optical flow estimation, recent works have focused on normal flow, which can be more reliably measured in regions with limited texture or strong edges. However, existing normal flow estimators are predominantly model-based and suffer from high errors. In this paper, we propose a novel supervised point-based method for normal flow estimation that overcomes the limitations of existing event learning-based approaches. Using a local point cloud encoder, our method directly estimates per-event normal flow from raw events, offering multiple unique advantages: 1) It produces temporally and spatially sharp predictions. 2) It supports more diverse data augmentation, such as random rotation, to improve robustness across various domains. 3) It naturally supports uncertainty quantification via ensemble inference, which benefits downstream tasks. 4) It enables training and inference on undistorted data in normalized camera coordinates, improving transferability across cameras. Extensive experiments demonstrate that our method achieves better and more consistent performance than state-of-the-art methods when transferred across different datasets. Leveraging this transferability, we train our model on the union of datasets and release it for public use. Finally, we introduce an egomotion solver based on a maximum-margin problem that uses normal flow and IMU to achieve strong performance in challenging scenarios. Codes are available at github.com/dhyuan99/VecKM flow.
more » « less
Free, publicly-accessible full text available October 19, 2026
Repurposing Pre-trained Video Diffusion Models for Event-based Video Interpolation

Chen, Jingxi; Feng, Brandon; Cai, Haoming Cai; Wang, Tianfu; Burner, Levi; Yuan, Dehao; Fermüller, Cornelia; Metzler, Christopher A; Aloimonos, Yiannis (June 2025, IEEE)

Video Frame Interpolation aims to recover realistic missing frames between observed frames, generating a highframe- rate video from a low-frame-rate video. However, without additional guidance, the large motion between frames makes this problem ill-posed. Event-based Video Frame Interpolation (EVFI) addresses this challenge by using sparse, high-temporal-resolution event measurements as motion guidance. This guidance allows EVFI methods to significantly outperform frame-only methods. However, to date, EVFI methods have relied on a limited set of paired eventframe training data, severely limiting their performance and generalization capabilities. In this work, we overcome the limited data challenge by adapting pre-trained video diffusion models trained on internet-scale datasets to EVFI. We experimentally validate our approach on real-world EVFI datasets, including a new one that we introduce. Our method outperforms existing methods and generalizes across cameras far better than existing approaches.
more » « less
Free, publicly-accessible full text available June 21, 2026
TTCDist: Fast Distance Estimation From an Active Monocular Camera Using Time-to-Contact

https://doi.org/10.1109/ICRA48891.2023.10160683

Burner, Levi; Sanket, Nitin J.; Fermüller, Cornelia; Aloimonos, Yiannis (May 2023, 2023 IEEE International Conference on Robotics and Automation (ICRA 2023))

Distance estimation from vision is fundamental for a myriad of robotic applications such as navigation, manipulation,and planning. Inspired by the mammal’s visual system, which gazes at specific objects, we develop two novel constraints relating time-to-contact, acceleration, and distance that we call the τ -constraint and Φ-constraint. They allow an active (moving) camera to estimate depth efficiently and accurately while using only a small portion of the image. The constraints are applicable to range sensing, sensor fusion, and visual servoing. We successfully validate the proposed constraints with two experiments. The first applies both constraints in a trajectory estimation task with a monocular camera and an Inertial Measurement Unit (IMU). Our methods achieve 30-70% less average trajectory error while running 25× and 6.2× faster than the popular Visual-Inertial Odometry methods VINS-Mono and ROVIO respectively. The second experiment demonstrates that when the constraints are used for feedback with efference copies the resulting closed-loop system’s eigenvalues are invariant to scaling of the applied control signal. We believe these results indicate the τ and Φ constraint’s potential as the basis of robust and efficient algorithms for a multitude of robotic applications.
more » « less
Full Text Available

Search for: All records